*PLEASE MAKE SURE THAT YOU SELECT `File -> Trust Notebook` BEFORE PROCEDING WITH READING THIS NOTEBOOK*

Exploratory Data Analysis

This Exploratory Data Analysis notebook will be used to get an overview of the provided dataset, which will be further investigated throughout the project. Along the notebook the dataset is investigated and several research questions are formulated.

The research questions are insightful findings that aim to create business value to companies in regards with travelling and business planning.

Installations and imports

First, the required packages are installed and imported. Moreover, the visualization parameters are adjusted for an enhanced experience.

Preliminary inspection of data

As a first approach to understand the datasets that are being worked with, the SweetViz visualisation package is used. The compare functionality of the package has been utilized in this part so as to get preliminary insights from January 2019 and January 2020, as well as being able to visually contrast them to one another.

The visualisations provided by SweetViz give a really good overview of what the datasets look like as it independently displays a comparison of each feature from the datasets. Thanks to these graphs, we are able to set which will be the research questions that will be studied in the exploratory analysis.

By looking into the report, the features of DAY_OF_MONTH and DAY_OF_WEEK are displayed in the fist place. It can be seen that there is a similar percentage of records per each day of the month, although the last day of January experiences an increase in the number of records, as well as per each day of the week. Hence, it will be interesting to explore what are the significant days regarding delays and cancellations in order to find if there is a relationship between these features.

In addition, by looking into the different variables in OP_UNIQUE_CARRIER of January 2019 and January 2020, it can be appreciated that the same airlines are in charge of operating the flights in both datasets. There are few airlines that operated a higher number of flights than the rest, but most of the airlines have similar numbers. Therefore, in the exploratory analysis we will study the likelihood of a flight being delayed or cancelled depending on the airline/carrier.

Furthermore, by looking into the airports in ORIGIN and DESTINATION, it is observed that the airports with the highest percentage of total flights are the same ones in both datasets. Also, the percentage of flights per airport is very similar for all of them. Hence, we will look into which are the airports that are most likely to have their flights cancelled.

Moreover, thanks to the fact that the datasets provide information regarding where the flight starts and where it finishes, it will be of a huge importance to determine which are the routes that perform the worst, namely, which are the ones that have a higher number of cancellations and delays.

By adding the weather data of the origin and destination airports into the provided datasets and thanks to the visualizations displayed on Sweetviz, we can see that the weather conditions vary considerably from year to year. Therefore, in the exploratory analysis if the weather conditions have an impact on the airlines activities and, if so, what are the most relevant conditions.

Routes exploration

The preliminary inspection of data has been highly rewarding since it allowed to formulate interesting research questions which, in its own way, serve insightful business value.

The following section addresses which routes are prone to cancel or be delayed in order to avoid these for future business trips. In knowing this, companies can reschedule accordingly or plan the travelling through other flights or other means of transport.

it is going to be found out which routes are the worst for each of the analysed years - 2019 and 2020 - in absolute and relative numbers, and a geaographical plot is going to be displayed to ease the analysis.

Firstly, the route feature is added to our datasets:

As a first analysis, we look into the normalized cancellations per route to see if there are any alarming values.

At the moment there are no alarming values to point out. Let's dive deeper into the research question looking at the most cancelling routes.

As a first analysis, we look into the normalized cancellations per route to see if there are any alarming values.

With the first glance it is noticeable that the routes that have the least arrivals delayed do have a slightly smaller percentage of delays than the equivalent for cancellations. In any case, there are no alarming values to point out. It's time to dive deeper into the research question looking at the routes with the most arrivals delayed. But first, some preparation needs to be done.

Preparation of plots

To properly make geographical plots and plot the absolute as well as relative number of flights cancelled per route, the code is going to be defined into functions for reusability purposes.

PS: Thanks to Bob Haffner for inspiration on plotting https://stackoverflow.com/questions/56550313/how-to-plot-routes-between-pairs-of-starting-and-ending-geospatial-points-using

The function below plots horizontal bar charts in the same cell. Quite handy.

A function is needed to put it all together, our previously used data with some weather features together with the geographical data.

The following function plots the map centered in the average coordinates of the routes to be plotted and then plots the airports and the routes, all in colorblind-friendly colors. The function restricts the plotting to the worst airports, and plots with more intensity the most occuring cancelled routes.

Cancellations

Having defined the functions needed, it´s time to look at the cancelled routes. First, let's take a look at the most (absolute) cancelled routes:

Looking at the above plots it´s interesting that some of this routes are all from 11 and 15 airports respectively. This is very locallised and in some cases the same route was cancelled in both directions, such as LAX-SFO or ORD-IND, for instance. This can only make one think there might be some unique events that drove the cancellations. This will be looked into later in the otehr research questions.

Moreover, it can be seen that the North East (Illinois, New York, Indiana, Virginia...) of the US has a big concentration of the cancellations in absolute terms.

Now let's take a look at the most cancelling routes relative to the number of flights they operate. In other words, the routes with the highest probability (from a frequentist point of view) of being cancelled.

This can be certainly insightful as it allows to check if the above mentioned routes are in fact the worst with regards to cancellations or are simply the most popular routes.

The results are quite surprising... Relative to the total number of flights, almost none of the previously analysed routes are the worst.

While the airports with the most cancelled routes in absolute terms are (mostly) important hubs or at least big cities, the airports considered in the worst routes above are not only big transport hubs but also regional airports with low transit during the winter months - and higher chances of cancelling flights.

There are no time patterns, although there are routes cancelled concerning the same regional airports in both years. This is the case of SUN, Friedman Memorial Airport, in the state of Idaho.

Regarding the hubs, the most notable ones are ORD, from Chicago as well as SFO, from San Francisco. This airports have several routes with a great proportion of cancellations although they are ones of the most transitted airports in the US, which makes it reasonable that they have so many cancellations.

It can be insightful to visually see the most-cancelling routes in the map. This will allow to better locate the cancelling routes.

Therefore, the datasets are prepared and plotted in the next cells.

With the above defined function plot_map, we can see the most-cancelling routes for 2019 and 2020.

Regarding the plot, the white circles represent the origin whereas the black circles represent the destinations. Moreover, the blue lines represent the 2019 cancelled flights while the orange lines represent the 2020 cancelled flights.

To fully understand the plot, it is important to notice that the more cancellations, the more intense the colors are.

The map is interactive, so feel free to zoom in or out and, for instance, get a closer look around Chicago - the Great lakes region.

At first glance it is remarkable that the cancelled flights during January 2019 were much more condensed than the equivalent in 2020. There are less airports involved and there are quite some cases with same route in both directions.

It is observable that most of the routes do not have the same issues in 2019 than in 2020, which makes it harder to estimate which routes will cancel the next year looking at historical data. Moreover, it is quite noticeable that, most of the airports are in grey, this is, they are departure and arrival.

All in all, there are no atemporal patterns that could be derived from looking at the trends. However, it can be very helpful when planning a business trip to consider whether to use one of the routes plotted, especially when involving regional airports.

Delays

Having derived the insights for the cancelled routes, it's time to look at the delays. First, let's take a look at the most (absolute) delayed routes:

Looking at the above plots it´s interesting that some of this routes are all from 9 and 11 airports respectively. this means that only 9 airports (in 2019) account for the 20 most delayed routes, which is a high concentration.

Moreover, it can be seen that, more precisely than in the cancellations analysis, it is the bigger hubs that account for the most delays. This is the case of New York, Chicago, San Francisco or Seattle, for instance.

Now let's take a look at the most delayed routes relative to the number of flights they operate. In other words, the routes with the highest probability (again, from a frequentist point of view) of being delayed.

This can be certainly insightful to prove that hubs are prone to have routes delayed.

The results are breathtaking... Relative to the total number of flights, almost none of the previously analysed routes are the worst, same as for cancellations.

The rate of routes delayed for 2019 is cerntainly high, with the top 20 routes having every flight delayed. As per 2020, the rates are not as hight though they are far from low, with only four routes down to almost 80%.

Regarding the hubs, the most notable ones are HOU as well as DAL, from Texas, but only for 2019. There are no airports that seem to have an intrinsic problem with delays in 2020.

It can be insightful to visually see the most-delayed routes in the map. This will allow to better locate the delaying routes.

Therefore, the datasets are prepared and plotted in the next cells.

With the above defined function plot_map, we can see the most-delaying routes for 2019 and 2020.

Regarding the plot, the white circles represent the origin whereas the black circles represent the destinations. Same as in Cancellations, the blue lines represent the 2019 delayed flights while the orange lines represent the 2020 delayed flights.

To fully understand the plot, it is important to notice that the more delays, the more intense the colors are.

The map is interactive, so feel free to zoom in or out and, for instance, get a closer look around Chicago - the Great lakes region.

At first glance it is remarkable that the cancelled flights during January 2019 were much more condensed than the equivalent in 2020. There are less airports involved and there are quite some cases with same route in both directions.

It is observable that most of the routes do not have the same issues in 2019 than in 2020, which makes it harder to estimate which routes will cancel the next year looking at historical data. Moreover, it is quite noticeable that, most of the airports are in grey, this is, they are departure and arrival.

Closing the routes topic: Are there any routes that are more prone to have flights delayed or cancelled?

It seems both in terms of cancellations and delays, this is not strictly the case. Routes that include a small regional airport, or airports that are in colder regions tend to have a lot of delays and cancellations, though the following is not true for all routes regarding these airports. Moreover, big hubs do have a bigger number of cancellations, but this is only due to the fact that there are a whopping number of flights coming out and to these airports (this is the case of LGA in New York or SFO in San Francisco, for instance).

There are no atemporal patterns that could be derived from looking at the trends. This research question could be indeed helpful when planning a business trip to consider whether to use one of the routes plotted, especially when involving regional airports.

Thus, overall, there is no strong correlation between given routes and cancellations or delays, but there are routes within specific airports, displayed in the geographical maps, that businesses should be wary of.

Airlines and carriers exploration

This section wishes to answer the following question: Are there any airlines that are most likely to have their flights delayed or cancelled?

In order to answer this question, the analysis will be focused on developing visualisations that will help us find a relationship between the different airlines and wether they have a higher number of flights that are being delayed or cancelled. Hence, the January 2019 and January 2020 datasets will be compared.

As we want to have the highest amount of data possible regarding delays and cancellations and this part of the exploratory analysis is only focused on the airlines/carriers, the datasets that do not contain weather data will be used since they have a higher number of rows.

Departure Delays

The possible relationship between the delays and airlines will be first explored in this part of the analysis. As a first approach, the number of total flights per airline as well as the number of delayed flights will be visualized.

Starting with January 2019, the total number of flights and the delayed ones will be put side by side to compare them.

It can be seen that in almost all cases, that the airlines with the highest number of flights are also the ones with the highest number of delays.

We will now see if this trend is also followed in January 2020.

The same happens for January 2020. So, for both years, the airlines that have the highest number of delayed flights correspond to the ones that have the highest number of flights in total. Therefore, a different approach is taken, so as to find which carriers have a higher probability of flight delay not based on absolute numbers but on percentage.

As seen in the displayed graphs, in most cases, the percentage of delayed flights per airline varies very notably comparing January 2019 and January 2020. Only the carriers DL and HA remain in their position as the airlines with the lowest percentage of delayed flights.

In order to gain insights regarding how the amount of delayed flights varies per airliens through time, the following graphs are displayed for January 2019 and January 2020:

As seen in the graphs, there are certain days when the number of delayed flights of most of the airlines experience a huge spike. Hence, the delayes are more related to the circustances of a certain day rather than to the carriers themselves

Cancellations

The possible relationship between cancellations and airlines is now explored in this part of the analysis.

The same reasoning as in the section above is followed. Therefore, in oreder to determine this possible relationship we will look directly into the cancellation percentage for each carrier for both years rather than to the absolute numbers.

Looking at the charts above, the percentages of cancellations from year to year varies a lot as the highest values for 2019 are around 8% of cancellations and above 3% for 2020. Furthermore, apart from the airline MQ which remains as the carrier with the highest perentage of cancellations, in most of the cases the percentage of cancellations per airline have a very big variation comparing both years.

In order to gain insights regarding how the amount of cancelled flights varies per airliens through time, the following graphs are displayed for January 2019 and January 2020:

As seen in the charts for both years, there are certain days when the number of cancelled flights a lot of airlines experience a huge spike spike at the same time.

In conclusion, answering the question: Are there any airlines that are most likely to have their flights delayed or cancelled?

No, it cannot be stated that there is a link between the likelihood of a flight being cancelled or delayed by the fact of being from one airline or another. As it was showed in the analysis, there is a lot of variation to the percentage of delayed flights for the different carriers from January 2019 to January 2020. Furthermore, as visualized in the charts displaying how the number of delays and cancellations varied through time per each airline, it was clear that there are certain days where most of the airlines experience a big increase in these numbers. In the following section of the exploratory analysis more insights regarding the time and cancellations and delays will be studied.

Impact of days of the week on cancellations and delays

This section aims to answer the research question: Does flying on a specific day of the week mean that flights are more likely to be delayed or cancelled?

To answer this question, we will examine the data from both January 2019 and 2020 and try to spot any trends that we could draw conclusions from. We will also develop visualisations that will help us get a better understanding of the data.

2019

Lets first analyze delayed flights in January 2019. Which days of the week have the most delays?

When we sum all the delays within January 2019 per week day, Thursdays, Wednesdays and Tuesdays accumulated the most delays. However, when we look at the calendar, January 2019 had 5 Tuesdays, Wednesdays and Thursdays and 4 Mondays, Fridays, Saturdays and Sundays. Dividing the sums by the amount of occurences in a week day in the month would thus give us an average number of delays per weekday and provide a better sense of whether a specific week day had more delays than others.

We can see that now the order changed slightly and the two week days with the most flights delayed on average are still Thursday and Wednesday. After the division, the average number of delays is now lower for Tuesdays and Fridays jumped up to the top 3 instead.
It can also be observed (in both of the tables above) that both the sum and the average amount of departure delays is lower compared to arrival delays.

Lets now look at each day of the month and how the delays look both as a sum of all the delayed flights within that specific day and as a percantage, relative to the total amount of flights in the US on that day:

Sorted by the arrival delay, in the table we can observe for each day of the month how many flights were delayed each day, both as a sum and as a relative percentage compared to all the flights on that day.
We can observe that there was a lot of flight delays recorded in the period berween 21st - 25th of January. Lets visualize the data in this table with a plot:

As the two plots show, the values for both arrival and departure delays are similar and they follow the same trend. We can also see that Saturdays (coloured in light-blue) are the days where least amount of delays occured.

There is a high amount of delays in the beginning of the year, arount the 1st and 2nd of Jnauary. This could be connected to an increased amount of people travelling after the New Year's Eve and Day, which was on Monday and Tuesday.
As mentioned before, there a spike of flight delays around 21st - 25th of January. After doing some investigation, from a news report by The Guardian we found out that this was related to the ongoing US government shutdown that caused 800,000 federal employees to be without pay. Among them there were more than 400,000 workers, including air traffic controllers and airport security workers. A lack of Transportation Security Administration officers forced some airports to close terminals. Flights mainly into three US airports were being delayed due to staffing issues, namely in New York, Philadelphia and New Jersey. This caused some arriving flights to be delayed an average of 1 hour and 26 minutes.
https://www.theguardian.com/world/2019/jan/25/flight-delays-laguardia-newark-philadelphia-shutdown

Lets now analyze cancelled flights in January 2019. Which days of the week have the most cancellations?

Again, we divide the sums by the amount of occurences of a week day in the month:

In the case of cancellations, Wednesdays, Mondays and Sundays had the most cancelled flights in January 2019. Lets now look at each day of the month and how the cancellations look both as a sum of all the cancelled flights within a day and as a percantage, relative to the total amount of flights in the US on that day:

Similarly to the delays, the Percentual column follows a similar order as the columns with the sums. Looking at the table, we can see that during the days between 19th-22nd and 28th-31st of January, there was a substantial increase in the amount of cancellations compared to the rest of the month. Lets visualize this and then investigate why this was the case:

The plot confirms our observation in a visual way. After doing some digging, we found out that the vast majority of these spikes in cancellations were caused by adverse weather conditions.

January 19th - 22nd: Thousands of flights cancelled due to snow storms and extreme cold (across mid-west and north-east of US, affecting cities such as Chicago, New York, Boston, etc.) https://edition.cnn.com/2019/01/20/weather/winter-weather-sunday-wxc/index.html https://www.cnbc.com/2019/01/18/winter-storm-harper-airlines-waive-change-fees-ahead-of-bad-weather.html

January 28th - 31st: Another, even bigger wave of snow storms and colds down to -20 / -40 degrees of Celsius, this time dominantly around Chicago and the Illinois area, practically all local flights cancelled: https://www.garda.com/crisis24/news-alerts/198441/us-almost-1000-flights-canceled-in-chicago-january-28-update-1 https://www.nbcchicago.com/news/local/flights-canceled-chicago-airports-ohare-midway/5999/

It can be observed that due to these extreme and perhaps unpredictable occurrences of weather, such as local snow storms, it seems that there is not a clear observable trend of whether a certain day of the week on general tends to have more delays or cancellations. Lets look at whether this is confirmed when analyzing data from January 2020.

2020

As the analysis done for the January 2020 data is exactly the same as for January 2019, to avoid being repetitive in this part we only comment on observations and insights that we obtain from the data.

January 2020 had 5 Wednesdays, Thursdays and Fridays:

Compared to 2019 where Saturday was the day with the least average amount of delays, in 2020 it seemed to be the complete opposite. The days with the most delays on average were Saturday, Friday and Thursday. In 2019, Thursdays were placed 1st and Fridays 2nd, which may indicate some pattern.

It seems that in the beginning of the month, delays follow a certain pattern. Midweek - on Tuesdays and Wednesdays, there is clearly less delays. This trend continues throughout the month. Thrsdays seem to be similar, however there is a big outlier on the 16th of January wich breaks this trend. Saturdays and Fridays have on average consistantly high number of delays.

The outliers that were observed were mostly weather-related, for instance:
Large winter impacting midwestern United States, states such as North Dakota, eastern South Dakota, northeastern Nebraska and western Pennsylvania causing flight delays.
https://thegate.boardingarea.com/travel-alert-january-2020-large-winter-storm-to-impact-midwestern-united-states/

Lets now look at cancelled flights in January 2020:

This is consistent with the findings for delays and the ranking of the days remained practically the same, with Friday and Saturday being the two dominant days in terms of cancellations. Plotting the sum of cancellations for each day of the month will give us a better indication of this phenomenon:

In the plot above we can see the reason why Fridays and Saturdays had such high values. In terms of cancellations, the reason is not necessarily related to a repeating pattern throughout the month. Instead, it has to do with certain isolated incidents that happened on those days. After investigating past news articles, we found out that between the Friday - Saturday of January 10th - 11th, 400 flights cancelled, with another 500 also delayed at the Dallas airport due to heavy thunderstorms and turbulent weather on Friday:
https://www.nbcdfw.com/news/local/turbulent-weather-causes-delays-cancellations-at-dfw-airport/2290241/
On Saturday the 11th, once again snow storms and strong winds were reported in the Chicago - 1200 flights cancelled:
https://chicago.cbslocal.com/2020/01/11/chicago-weather-over-500-flights-canceled-at-ohare-midway-airports-ahead-of-winter-storm/ The following week, also occuring on Friday and Saturday, between 17th-18th, 1,600 flights canceled due to winter storm with snow, freezing rain and sleet in the Upper Midwest, affecting mostly cities such as Chicago, Minneapolis, Oklahoma City, Kansas City and St. Louis:
https://edition.cnn.com/2020/01/17/weather/storm-forecast-snow-ice-friday/index.html

In general, answering the question: Does flying on a specific day of the week mean that flights are more likely to be delayed or cancelled?

It seems like in terms of cancellations, this is not the case. Based on the evidence provided by the news reports, it seems that the vast majority of cancellations tend to be related to unfavorable weather conditions, such as snow storms or strong winds.

In terms of delays, this was the ranking of the week days in the respective months:
January 2019 - Thursday, Wednesday, Friday, Monday Sunday, Tuesday, Saturday
January 2020 - Saturday, Friday, Thursday, Monday, Sunday, Wednesday, Tuesday
It is hard to draw conclusions from this, because even though some days were consistent with the amount of delays they accumulated in both years, such as Thursdays, Fridays (high amount of delays) and Mondays, Sundays and Tuesdays (relatively low amount of delays), it can be observed that days such as Wednesdays and Saturdays are completely different in this respect. The week day with the highest amount of delays in 2020, Saturday, was the least delay-impacted day in 2019 and a similar phenomenon can be observed for Wednesday.
As it was the case with flight cancellations, flight delays are also severely impacted by weather. In January 2019, flights were also impacted by the US government shutdown

Thus, overall, we cannot conclude that the different days of the week have a decisive impact on delays or cancellations. We would require more historical data to be able to support this hypothesis. The evidence suggests that the main impact on flight disruptions originates in weather conditions. We will explore this in more detail in the following section.

Impact of weather conditions

We wish to answer the question of : Do the weather conditions have an impact on the airline's activities? If so, which weather conditions are relevant?

To answer the question, the distributions of weather data will be plotted for each level in cancellations or delays. And to support the arguments, a t-test will be used to compare the mean between two groups.

From the histograms of the weather data, we see that the snow and precipitation columns are positively skewed, so we apply a log transformation to make it closer to a normal distribution. To avoid log of 0, we add 1 to the values, as log(1) = 0

We redefine the columns to use log instead

There are still plenty of values around 0, but it is more balanced on the scale.

Let's inspect the boxplot of the features to see difference of distributions for cancelled and delayed flights

For cancelled flights, it is clear that temperature, snow and precipitation are features to consider.

We only observe the origin weather data for delayed departures as destination weather data should not have an effect on this. The same logic is used for arrival delays (we only observe the destination weather data).

For delays, the weather data seems to not be differently distributed, with the exception of precipitation.

The differences in the boxplots found in cancellations are based on intuition and graphical interpretation. To properly validate if there is indeed a statistical significant difference, a t-test is performed on the features for cancellations and delays.

These results are very interesting, and shows why plots might not always contain all the necessary information. The t-tests show that almost all the weather data features have a statistically significant difference in group means for cancellations.

For delays, which seemed to have the same group means in the plots, the t-test shows statistically significant differences for all features.

In general, answering the question: Do the weather conditions have an impact on the airline's activities? If so, which weather conditions are relevant?

Yes, weather conditions have a statistically significant impact on the airline's activities. While the boxplots show that mostly temperature, snow and precipitation have a large difference in distributions when it comes to the cancellations. It is actually the case, from the t-tests, that all features play a significant part when differentiating between cases of cancellations and delays.

Airports that are more likely to cancel

In this section we wish to answer the following question: Which airports are more likely to have cancellations?

To answer to this question, we will focus on the cancellation data for both years, as well as, analyse if there's any correlation between cancellations and airports.

Cancellation Counting

Firstly, we will see the cancellation frequency distribution for 2019 and 2020.

On a first sight, it seems that the for 2019 there where more cancellations than for 2020, even though the number of total flights in 2020 were slightly bigger. In order to get a deeper understanding, let's get some number of cancellations for both years. Moreover, the cancellation variable is considerable imbalanced, this is something that we will have to take into account later in our predictive models.

Our first impression of the data was right, and evidently for 2019 the cancellation percentage was higher than for 2020, even though for January 2020 the total number of flights was higher than in January 2019. Could this be related to different weather conditions?

Let's investigate further.

Cancellations by Airport

We will display which are the airports with most cancellations for 2019 and 2020, hopefully this will bring more insights to our analysis!

The airport with most cancellations in 2019 was Chicago O'Hare International Airport, being followed by Chicago Midway International Airport and Boston Logan International Airport. Curiosly, the top two are located in the state of Illinois.

Again, the airport with most cancellation in 2020 was Chicago O'Hare International Airport, being followed by Dallas Fort Worth International Airport. Remarkably, Chicago Midway International Airport has a very similar cancellation rate compared to previous year. Either way, again 2 out of the top 3 airport with most cancellations are located in Illinois, more precisely in the Chicago metropolitan area.

Correlation Analysis

The previous study has shown that for the two years, Chicago' airports has shown an issue in regards to cancellations, suggesting that there could be a correlation between those airports and the cancellation rate. Nonetheless, to avoid the well-known jumping into conclusions, we will perform a correlation study, to assess if wether the airport itself could be a cause for a cancellation event.

Indeed, the correlation matrix throws us interesting insights, it shows us that despite our first expectations, it seems that there is not a strong correlation between the cancellations and the destiny nor origin airports. Moreover, it can be also appreciated that the origin and destiny weather conditions don't have a strong impact on the cancellations either. Then, why for both years the airports with the most cancellations are located in Chicago? A possible answer to that could be the extraordinary adverse weather events that hit Chicago in January 2019, and that caused the cancellation of thousands of flights.

Summarizing, it is concluded that despite the preliminary assumptions, the airports cannot be seen as a cause for a cancellation event. Moreover, it was found that extreme events of unpredictable nature, such as snowstorms, strikes, etc... happening in the airport area would have a huge impact on the normal functioning of an airport, and that would be those events the most probable cause of a cancellation, rather than the airport itself.